nlp_architect.solutions.set_expansion package

Submodules

nlp_architect.solutions.set_expansion.expand_server module

class nlp_architect.solutions.set_expansion.expand_server.MyTCPHandler(request, client_address, server)[source]

Bases: socketserver.BaseRequestHandler

A simple server to load the w2v model and handle expand requests from the ui

static annotate(text, seed)[source]
handle()[source]

nlp_architect.solutions.set_expansion.prepare_data module

Script that prepares the input corpus for np2vec training: it runs NP extractor on the corpus and marks extracted NP’s.

nlp_architect.solutions.set_expansion.prepare_data.extract_noun_phrases(docs, nlp_parser, chunker)[source]
nlp_architect.solutions.set_expansion.prepare_data.get_group_norm(spacy_span)[source]

Give a span, determine the its group and return the normalized text representing the group

Parameters:spacy_span (spacy.tokens.Span) –
nlp_architect.solutions.set_expansion.prepare_data.load_parser(chunker)[source]
nlp_architect.solutions.set_expansion.prepare_data.mark_noun_phrases(corpus_file, marked_corpus_file, nlp_parser, lines_count, chunker, mark_char='_', grouping=False)[source]
nlp_architect.solutions.set_expansion.prepare_data.merge_groups(np, old_id, diff_id)[source]

nlp_architect.solutions.set_expansion.set_expand module

class nlp_architect.solutions.set_expansion.set_expand.SetExpand(np2vec_model_file, binary=False, word_ngrams=False, grouping=False, light_grouping=False, grouping_map_dir=None)[source]

Bases: object

Set expansion module, given a trained np2vec model.

expand(seed, topn=500)[source]

Given a seed of terms, return the expanded set of terms.

Parameters:
  • seed – seed terms
  • topn – maximal number of expanded terms to return
Returns:

up to topn expanded terms and their probabilities

get_group(term)[source]
get_seed_id(seed)[source]
get_vocab()[source]

Return the vocabulary as the list of terms.

Returns:the list of terms.
in_vocab(term)[source]
seed2term_similarity(seed_id, term_id)[source]

Compute cosine similarity between a seed terms and a term. :param seed_id: seed term id’s :param term_id: the term id

Returns:Similarity between the seed terms and the term
similarity(terms, seed, threshold)[source]
term2id(term, suffix=True)[source]

Given an term, return its id.

Parameters:term (str) – term (noun phrase)
Returns:its id (if is part of the model)
term2term_similarity(term_id_1, term_id_2)[source]

Compute cosine similarity between two term id’s. :param term_id_1: first term id :param term_id_2: second term id

Returns:Similarity between the first and second term id’s

Module contents